Simulation of stochastic network dynamics via entropic matching 
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Abstract 

The simulation of complex stochastic network dynamics arising, for instance, from models of 
coupled biomolecular processes remains computationally challenging. Often, the necessity to scan 
a models' dynamics over a large parameter space renders full-fledged stochastic simulations im- 
practical, motivating approximation schemes. Here we propose an approximation scheme which 
improves upon the standard linear noise approximation while retaining similar computational com- 
plexity. The underlying idea is to minimize, at each time step, the Kullback-Leibler divergence 
between the true time evolved probability distribution and a Gaussian approximation (entropic 
matching). This condition leads to ordinary differential equations for the mean and the covariance 
matrix of the Gaussian. For cases of weak nonlinearity, the method is more accurate than the 
linear method when both are compared to stochastic simulations. 
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I. INTRODUCTION 



Stochastic dynamical systems arise in many areas of science, with much recent interest 
focusing on biomolecular reaction networks that control the behavior of individual cells [1- 
4]. For instance, the expression levels of different genes in a single cell are often coupled by 
direct or indirect regulation and are subject to intrinsic (reaction) noise as well as extrinsic 
(parametric) noise [5]. Models of such systems can be based on a wide range of different 
approaches and mathematical tools [6, 7]. A full treatment of noise effects on the single- 
molecule level requires an approach based on the chemical Master equation [8]. However, 
many noise phenomena can also be understood and quantitatively analyzed within a more 
coarse-grained continuum description based on stochastic differential equations or, equiv- 
alently, the associated Fokker-Planck equations [7-9]. For instance, such a description is 
adequate to describe the excitable system behavior of a natural genetic circuit whose exci- 
tations are triggered by noise [10], or the synchronizing effect of a coupling between noisy 
synthetic "repressilators" [11, 12]. 

A recurrent problem in the analysis of these systems is that the quantitative behavior of 
the theoretical models must be computed not only once, but for a large number of different 
parameter values. One common situation is that, for a given model, one would like to 
determine a "phase diagram" of system behaviors over the entire parameter space, in order 
to understand the model on a theoretical level. Similarly, when models are leveraged for the 
interpretation of experiments, model fitting to data is required to infer system parameters 
or to discriminate between different variants of the model. In quantitative biology, there is 
currently a clear need for systematic and efficient methods to reverse engineer models from a 
limited number of noisy experimental observations [13, 14]. Any such method requires as an 
essential ingredient an efficient technique for the forward simulation of the observables. For 
discrete stochastic models based on the chemical Master equation, recent progress in this 
direction has been made by using spectral methods [15]. Also, efficient sampling techniques 
have been developed for rare event problems [16, 17]. Here, we focus on continuum models 
and develop an approximation scheme that can efficiently capture the stochastic dynamics 
of the "typical events" even in larger dynamical systems. 

For a given nonlinear stochastic system, we consider the dynamics of its probability dis- 
tribution over state space. Our scheme is reminiscent of the linear noise approximation 
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[8, 9] in that it approximates the probability distribution at each point in time by a Gaus- 
sian distribution. The essential difference is in the criterion used to match the Gaussian 
approximation to the true distribution. In our method, we take a more global stance at the 
matching process, by invoking the Kullback-Leibler divergence [18] as accuracy measure. We 
minimize the Kullback-Leibler divergence at each point in time to obtain a set of ordinary 
differential equations for the parameters of the Gaussian distribution which best approxi- 
mates, in this sense, the current probabilistic state of the system. We refer to our matching 
condition as entropic matching. It was recently proposed as a general method to construct 
simulation schemes for partial differential equations [19]. Here, we develop and test this 
idea for the case of stochastic differential and Fokker-Planck equations. For the examples 
considered here, we find the method to be more accurate and robust than the linear noise 
approximation, while the computational complexity is comparable. In the following, we first 
formulate the theoretical problem, then describe the method in detail, and finally illustrate 
and test it on simple example systems. 

II. PROBLEM STATEMENT 

We consider a nonlinear stochastic system with N variables q with i = 1, N. We think 
of the Ci as concentration variables for different molecular species in a biomolecular network, 
however the method developed below is not specific to these systems. The interactions 
between the molecular species are specified by a set of dynamical equations, which depend 
on a set of parameters, which we collectively denote as a. Throughout this article, whenever 
we omit an index, we refer to the vector containing all elements of the respective set. The 
dynamical equations can be written as stochastic differential equations (SDE) 

c(t) = ^c(t) = f(c(t),a)+ V (t), (1) 

where / is the nonlinear function regulating each node of the biomolecular network and rj 
represents a random process characterizing the system's intrinsic fluctuations. We can think 
of a single trajectory obtained by solving Eq. (1) (with one realization of the noise process 
t](t)) as the biochemical trajectory of a single cell, and an ensemble of such trajectories as the 
corresponding dynamics for a population of cells. This picture relies on the assumption that 
different cells' trajectories are uncorrelated, i.e. that there is no intercellular communication. 
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Within the framework of SDE's, extrinsic cell-to-cell noise would need to be accounted for by 
specifying a probability distribution for the parameters a, which can either be static or have 
a dynamics by itself. For simplicity, we will ignore extrinsic noise in the following. From 
the SDE system (1) one can then directly obtain [8, 9] a Fokker-Plank Equation (FPE) of 
the form 

d t P(c, t) = -d c [f(c, a) P(c, t)} + \d 2 c [X P(c, t)\ , (2) 

where X represents the covariance of the random process denoted by t] in Eq. (1). Instead of 
deriving the FPE from a phenomenological set of SDE's, one can also start from the chemical 
Master equation for a given set of microscopic biomolecular reactions and then approximate 
the discrete stochastic process by a continuous drift and diffusion process [8, 9]. 

Either way, given a FPE for our biomolecular network, one central task is to be able 
to determine the values for the parameters a for which the stochastic model best describes 
experimental data that is taken at a set of time points tj. For instance, within a Bayesian 
approach to infer the parameter values from the data, it is necessary to calculate the posterior 
density 

p , , . Pja)^ P(c\a,t 3 ) 

P(a\c,t) = — - T (3) 

JdaP(a)U;, ]P{c\a,tj) 

where P{a) represents the prior knowledge about the parameters of the system. Thus, the 
posterior density can be obtained once we have calculated the "forward solution" P(c\a,tj) 
for all relevant time points. The direct solution of this problem is to simulate uncorrelated 
paths of the system given by Eq. (1) using an appropriate discretization procedure [20] and 
reconstruct the probability density function for the whole population. This approach is 
very computationally heavy and thus unsuitable for large systems. However, in our test 
applications below (which are small systems), we use this exact approach as a reference. 

The proposed method relies on approximating the true distribution for P(c\a, tj) with a 
Gaussian distribution 

P(c\a,t j ) = g(c-c(t),C(t)), (4) 

G(c-c,C) = - f L=e-^^ c - 1 ^\ (5) 
V |27rC| 

Here, \C\ = det(C) and c^d = J2iLi c idi is a scalar product over the space of network 
state vectors. In the Gaussian approximation, knowledge of the temporal dynamics of the 



dynamics of its average c and covariance matrix C is sufficient, which simplifies not only 
analytic but also numerical calculations. The proposed method thus consists of calculating 
c(t) and C(t) such that the Gaussian approximation best represents P(c\a,t) in an informa- 
tion theoretic sense. The resulting equations can then be efficiently computed numerically 
and the resulting approximation used to infer the desired parameters. How well the true 
non-Gaussian probability distribution can be approximated by distribution (4) is critical to 
the performance of our algorithm, as we will see. 

III. METHOD 
A. Background 

Since we are approximating P(c\a,tj) by the Gaussian (4), only the first two moments 
need to be calculated for all times. Hence we seek closed ordinary differential equations 
(ODEs) for the first two moments, without requiring the calculation of the full probability 
distribution. They are given by two functions, f 5 and fc such that: 



The usual approach is to linearize the function / in Eq. (2). Then, assuming the initial 
condition is itself Gaussian, the evolution of the probability distribution consists of nothing 
more than a series of linear transformations. Thus it is possible to quickly compute linearized 
evolution equations for the Gaussian approximation: 



where J is the Jacobian of the function / at the current concentration c(t). In our numerical 
examples below, we use this linear noise approximation as a reference for comparison to our 
proposed method. 

With our method, we now aim to derive evolution equations of the form (6) without 
an a priori linearization of /, which will better preserve the characteristics of the original 
nonlinear system. 




(6) 



m = m 

C(t) = J{t)C{t) + C(t) J T (t) + X(t) 



(7) 
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B. Proposed method 



We will construct f 5 and fc using the concept of entropic matching of evolving distribu- 
tions [19]. A general outline of the method is to assume an initial Gaussian distribution for 
c, evolve it according to the nonlinear function / and then find the Gaussian distribution 
which best matches it in an information theoretic sense. This allows for the derivation of 
ODEs for the Gaussian parameters conforming to the aforementioned criterion. 

An initial Gaussian distribution P(c) = Q{c — c, C) at time t characterized by c = c(t) 
and C = C(t) will evolve according to the dynamics described by the nonlinear function /. 
For now we assume this evolution to be deterministic. Intrinsic noise shall be dealt with 
later. After an infinitesimal time step St, the evolved distribution function P'(d) will be: 



p'(c') = g(c-c,c)i 



-Stc 



dc 



(8) 



dc 

according to the rule of change of variables for probabilities, where c(t + St) = d = c + Stc 
and thus \dd/dc\ c=c i = |1 + St df/dc\ c=c i to linear order in St. P'(d) is in general not of 
Gaussian functional form. Obviously a representation such as P'(c') = Q(d — d, C) with 
an updated d = c(t') and C = C(t') at t' = t + St is no longer exact. However, it is possible 
to obtain values for d and C such that the information content of P'(d) is represented 
as well as possible. Information theoretical considerations [21-23] single out the Maximum 
Entropy Principle for this. This principle is equivalent to requiring a minimal Kullback- 
Leibler divergence [18] of Q{d — d, C) to P', or a minimal relative Gibbs free energy [24]. 
All these measures (relative entropy, Kullback-Leibler divergence, and Gibbs free energy) can 
be regarded as a measure of the information theoretical distance between two distributions, 
although they are not a metric in the strict mathematical sense due to their asymmetry with 
respect to the different roles of the matching and matched distributions. 

To obtain such a maximally informative Gaussian we entropically match the parameters 
of the Gaussian Q' to the time evolved distribution P' by minimizing the relative entropy 
(or Kullback-Leibler divergence): 

S(G', P ') = -j dd Q{d - d, C) log g(C p 7 ( y /} (9) 

We can consider this expression as a functional to be minimized with respect to all degrees 
of freedom of our matching distribution, or simply as a function of the two parameters d and 



C to be minimized with respect to their finite number of degrees of freedom. The minimum 
of this functional will define a Gaussian distribution which has maximal information content 
about the time evolved distribution P' subject to a deterministic nonlinear evolution. 

Defining S(Q', P') = S(d, C) since the Kullback-Leibler divergence is now a function 
only of the Gaussian parameters, Eq. (9) is explicitly 

S(c',C) = 

G(a - c(t),C(t))| Ci=c /_ 5 ^ 
df 



dd G(d - d, C) log 



J dc' g{ c ' -c',C) 



1 , \°'\ , 

2 log ^+ log 



1-St 



dc 



(10) 



+~(c' - c') T C'-\c' - c') - \ (c - c) T C~\c - C)\ c=c ^stc 

where P' was expressed as a function of c = d — Stc. By discarding terms of order St 2 , 
the entropic divergence between the Gaussian Q' and the time evolved former Gaussian P', 
Eq. (9), can be brought into the from 



S(c- C')^ lQ g(^) + 2 tr [C'C^-1 



-5t(2 - Cr x <y - C'C- 1 ) 



dc 



(11) 



+ \{c ~ cfC-\d - c) - St(d - cfC- 1 </(c0> c , , 



where ( • ) c , denotes the expectation value over G'(d) = G{d — c', C). The desired values of c' and 
C are the minima of Eq. (11). 

To take intrinsic noise into account, we model it as a Gaussian of zero mean and St 
accumulated covariance matrix. This assumption is not necessary but is general enough for 
our purposes and allows us to go further with the calculations analytically. We define this 



noise as: 



st 



dt' dt" Ut')i k (t")) 



(12) 



To take these fluctuations into account we consider the final probability distribution to be 
a sum of this random variable with the maximally informative Gaussian, which corresponds 
to a convolution: 
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P(c') = J di G(d - <! + £, *<W) (13) 

= Q{d -c',C + X6t), (14) 

where c' and C" were obtained by the entropic matching process. This noise addition in 
an Ito-like fashion provides the final Gaussian approximation for time if — t + St including 
intrinsic noise. This procedure can then be be applied again and again to generate subsequent 
time steps, creating an iterative scheme to track the evolution of c(t) and C(t). Having such 
a scheme, we can take the limit St — > and obtain an ODE for the time evolution of c and 
C of the form of Eq. (6), from which f 5 and fc can be read off. 

Minimizing Eq. (11) with respect to c'(t) and C'(t) we obtain simple equations for the 
evolution of both averages and covariances. By adding the required noise term to the 
evolution of the covariance matrix and taking the limit St — > we obtain the final ODEs for 
the evolution of the Gaussian approximation of our system: 



C 



(f(c)} 

df 
dc 



\dc 



(15) 



These are the general equations we sought. Since the Gaussian expectation value of a 
nonlinear function is usually analytically inaccessible in a closed form we now develop an 
approximation method which will make numerical calculations tractable. 

One may expand the nonlinear function determining the dynamics around its average as: 



N 



m = E 



1 d n f 



n=0 



n\ dc r ' 



c-c 



(f)c = /(C) + 



dc 



df 
dc 



Iff 
2 dc 2 

Iff 
2 dc 3 



+ 



C + ... 



C + ... 



(16) 

(17) 
(18) 



This is justified when the spread of the PDF around its average is small and highlights 
why in many cases the linear approximation works, as it corresponds to the first order terms 
in this series and is thus valid in regimes where noise is very small. In index notation we 
can write the expansion as: 
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(fi)c = ^ + 



1 d 2 f t 



Cki + • • • 



(19) 




c=c 



2 dc k dci c= - 
| 1 d 3 f, 
2 dcj dck dci 



c=c 



Cm + ■ ■ ■ 



(20) 



If we were to take only the first order in the expansion (16) we would recover the commonly 
used linear noise approximation, Eq. (7). Our derivation however shows that the correct 
equation for linearised evolution includes the full Gaussian expectation values and thus we 
expect a significant improvement in performance by taking additional terms in this series. 
In a practical case it appears that with only the second order approximation significant 
improvements are already visible over the linear approximation, because the correlation 
structure is taken into consideration in the evolution of the parameters. In our numerical 
examples presented in the following section, we use the expansion up to second order as in 



Thus, when the method is applicable, it will be several orders of magnitude faster than a 
stochastic simulation. It remains to show under which situations it is applicable. In the next 
section we will investigate the performance of the algorithm with some numerical examples. 

IV. TEST APPLICATIONS 

The ODEs of the entropic matching algorithm were integrated with the Adams-Moulton 
method. The usage of this implicit integration scheme is appropriate here due to its numer- 
ical robustness, which can successfully deal with the stiffness of the ODEs in the general 
nonlinear case. The results were compared to stochastic simulations of the system dynamics 
using the stochastic analogue of the Euler scheme and to the linear noise approximation (7). 

A. Van der Pol oscillator 

The van der Pol oscillator was chosen as an initial test case. Here it is possible to exactly 
calculate (/(c)) c = f dcf(c)Q(c — c,C), the expection value of the nonlinear function / 
describing deterministic part of the dynamics due to the vanishing of all derivatives in its 
Taylor expansion of order three and higher. Thus, the only approximation made in this case 



Eq. (19). 
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is the assumption of Gaussianity. Furthermore, a single parameter controls the nonlinearity 
strength, allowing one to investigate the algorithm's performance for different regimes. 
The van der Pol dynamics are described by: 

Xi = - xf)ii - ufei + S ^lij{xj - Xi) + (21) 

where \x controls the nonlinearity, 7 determines the connectivity of the network and u 
controls individual oscillation frequency of each node. £ <r- > G(£,X) represents white noise 
intrinsic to the system (in the numerical experiments only diagonal covariance matrices were 
considered for the white noise). By doing the usual substitution y = x it is possible to bring 
the dynamics into a form suitable for numerical integration. It then becomes clear that 
the fourth order derivative tensor vanishes everywhere and thus it is possible to calculate 
the expectation value of the nonlinear function exactly with little computational effort by 
taking only the first two terms of the expansion; cf. Eq. (17). The parameter values used 
for the simulation were 7^ = except 70,1 = 2, 7^0 = 5, 72,1 = 3; xq = 2, 0, 0; yo = 0, 1, 2; 
oj — 2, 1, 2. The intrinsic noise (12) was characterized by a diagonal covariance matrix with 
diagonal entries set to 0.1. 

As a term of comparison we simulated the system for a quasi-linear case, [i = 0.05, where 
all three methods should perform identically. This is clear from the second panel of Fig. 1, 
where the averages coincide perfectly for the visualized species. All three methods used the 
same initial values for the averages. The covariance matrix was initialized as the value of the 
intrinsic noise level for both our method and the linear noise method, while in the stochastic 
simulation all paths begin at the same initial value and therefore the initial variance is null. 
In the third panel, where the standard deviation is plotted for one of the species, it is clear 
that after some time all three methods also produce identical values. 

The stochastic simulation consisted of 10 3 runs of the system with time step 5t = 10 -3 . 
Such a small time step was necessary to ensure correct results with the Euler method. The 
other methods consisted of a single run of the Adams-Moulton method with 5t = 10~ 3 . 
In this case, however, decreasing St to 10 _1 had no effect in the accuracy of the obtained 
trajectories. It must be noted that for the Gaussian methods the elements of the covariance 
matrix must be followed in addition to the average, meaning that we must follow 3JV + Af2 
numbers instead of N. In practice, the Gaussian methods were two orders of magnitude 
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FIG. 1: First panel, stochastic simulation with 1000 runs (some individual runs are plotted) of 
system (21) and corresponding average value for the first species of a system with [i = 0.05 and 
N = 3. Second panel, comparison of stochastic average value (solid) with prediction using Eq. (15) 
(dashed) and Eq. (7) (dotted). Third panel, comparison of standard deviation for the stochastic 
simulation (solid) with prediction using Eq. (15) (dashed) and Eq. (7) (dotted). 

faster than the stochastic simulations at the same time step (which could be safely reduced 
for the former, yielding an order of magnitude further speed increase). 

In the first three panels of Fig. 2 it is clear all algorithms perform well since the true 
probability distribution remains approximately gaussian even at a late time t = 10. Since 



11 




FIG. 2: Final histogram of stochastic simulation with 1000 runs of system (21) with [i = 0.05 
and N = 3 and corresponding prediction using Eq. (7) for time t = T, in this case T = 10. First 
three panels represent the three components of the system. Last panel, histogram of chi squared 
distribution for the stochastic simulation data compared to the x 2 distribution with 3 degrees of 
freedom. 
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only projections of multivariate gaussians can be visualized, we calculated the \ 2 probability 
distribution, which can help us verify if the full distribution (i.e., taking into account the 
full correlation matrix) does indeed match a Gaussian with the predicted parameters. If the 
prediction matches the simulation, a histogram of % 2 = (F — c p )^C J 7 1 (F — c p ), where c p and 
C p are the predicted average and covariance, should match the chi squared distribution for 
a Gaussian Q(c, C) with N degrees of freedom, where N is the number of dimensions of the 
system. In the fourth panel of Fig. 2 it is clear that this is the case for \x = 0.05. 

The case with /i = 1.5 is analyzed next. The system now exhibits clear nonlinear behavior, 
although it is not predominant. Comparing the results of the entropically matched scheme 
with those of the linear noise approximation, Fig. 3, the average trajectory approximates 
the stochastic simulations better. Dampening due to the progressive loss of synchronization 
in individual stochastic trajectories is apparent only in our method. This is due to the 
nonlinear ity and is evident by looking at the second order term in Eq. (17) where the 
Hessian contains terms proportional to —fiXi and —fiyi and is multiplied by the covariance 
matrix. Thus there is a damping proportional to the magnitude of the fluctuations. 

Regarding the covariance matrix (third panel in Fig. 3), in the linear noise approximation 
the estimate for the variance oscillates around the true value, under or overestimating it, until 
finally losing synchrony. In contrast, our method consistently underestimates the variance 
(inherent to the Gaussian approximation, since the true distribution exhibits "fat tails", 
Fig. 4, first panel) but always remains synchronous with the true value. Thus we consider 
the estimate produced by entropic matching to be more robust. It has been verified that 
in some cases the covariance matrix diverges in the linear noise approximation while the 
entropically matched scheme still produces reasonable results. 

Regarding the histograms, Fig. 4, we note that the distribution remains unimodal for two 
components of the system and bimodal for one. Even though the unimodal distributions 
are non Gaussian our method still approximates their average and standard deviation well, 
which is the intended behavior. In the fourth panel it is visible that the multivariate gaussian 
no longer fully describes the system, due to the underestimation of the elements of the 
covariance matrix. However, the histogram still exhibits a quantitative shape which indicates 
the system is approximately Gaussian. If the target distribution would be completely non 
Gaussian, a flat histogram would be expected. 

In the case of high nonlinearity (// ^> 1) the Gaussian approximation breaks down as 
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FIG. 3: First panel, stochastic simulation with 1000 runs (some individual runs are plotted) of 
system (21) and corresponding average value for the first species of a system with /x = 1.5 and 
N = 3. Second panel, comparison of stochastic average value (solid) with prediction using Eq. (15) 
(dashed) and Eq. (7) (dotted). Third panel, comparison of standard deviation for the stochastic 
simulation (solid) with prediction using Eq. (15) (dashed) and Eq. (7) (dotted). 

the tails of the target distribution gain significant probability mass and thus contribute 
more heavily to the trajectory, which means they cannot be ignored any more. Further- 
more bimodal distributions arise, which cannot be accurately described by a multivariate 
Gaussian. 
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FIG. 4: Final histogram of stochastic simulation with 1000 runs of system (21) with = 1.5 and 
N = 3 and corresponding prediction using Eq. (7) for time t = T, in this case T = 10. First 
three panels represent the three components of the system. Last panel, histogram of chi squared 
distribution for the stochastic simulation data compared to the x 2 distribution with 3 degrees of 
freedom. 
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B. Gene network 



As a final example, a simple genetic network was simulated. The simulated network 
consists of a system of mutually repressing genes inspired by the repressilator [11], which 
exhibits sustained oscillations for a wide range of parameter regimes. The dynamics of gene 
expression are simulated using hill type activation functions, where each gene is repressed 
by another in the circuit: 

V, • Q (22) 



c] + k 



where j = (i+1) mod N, k is the half activation level, and n is the binding cooperativity 
(in a sense, the nonlinearity parameter). 

However, in this case d is not Gaussian and c-independent noise but rather of Poissonian 
nature, which means that the noise covariance increases linearly with c. In this case it is 
still possible to apply the previously derived algorithm by defining a new variable p = log c 
such that 

^.rig-pi 

ft = eS^- A + & < 23 > 
where now £ -r- > £/(£, X). This means that Q has a log normal distribution which has the 
property a\ oc (c), which is more realistic. 

The nonlinear function was Taylor expanded up to second order, which means that in this 
case there is also an approximation error in the expectation value of the function. However, 
it is sufficient to yield significantly better results than the linear noise approximation for low 
cooperativity (i.e., n < 3). 

We have examined the case of n = 2 with uncorrelated intrinsic fluctuations of magnitude 
0.1. As evidenced by Fig. 5, our method again outperforms a linear approximation at 
predicting the average value (middle panel). The standard deviation was overestimated, 
but as before, our estimate appears to be more robust. From the histograms in Fig. 6 we 
can see the distributions remain unimodal, which suggests that in this case the algorithm's 
performance could be improved by improving the series expansion used to represent the 
nonlinear function. However, as is evidenced by the \ 2 distribution in the last panel, the 
obtained estimate at the end of the simulation is relatively good. 
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FIG. 5: First panel, stochastic simulation with 1000 runs (some individual runs are plotted) of 
system (23) and corresponding average value for the first species of a system with n = 2 and 
N = 3. Second panel, comparison of stochastic average value (solid) with prediction using Eq. (15) 
(dashed) and Eq. (7) (dotted). Third panel, comparison of standard deviation for the stochastic 
simulation (solid) with prediction using Eq. (15) (dashed) and Eq. (7) (dotted). 

V. DISCUSSION AND CONCLUSIONS 

This paper proposes an algorithm to predict the evolution of an ensemble of biochemical 
networks subject to stochastic fluctuations. The algorithm is based on entropic matching, a 
strategy developed in [19] to construct simulation schemes based on the information theo- 
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FIG. 6: Final histogram of stochastic simulation with 1000 runs of system (23) with n = 2 and 
N = 3 and corresponding prediction using Eq. (7) for time t = T, in this case T = 10. First 
three panels represent the three components of the system. Last panel, histogram of chi squared 
distribution for the stochastic simulation data compared to the x 2 distribution with 3 degrees of 
freedom. 
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retical concept of the Maximum Entropy Principle. For each step in the time evolution of a 
system, the probability distribution describing the state of system is matched to a Gaussian 
and thus ODEs for the first and second moment of the distribution are derived (Eq.(15)). 
The evolution of these moments then tracks the average and covariance of the system pa- 
rameters. We have implemented these equations numerically by resorting to a series method 
to compute the Gaussian expectation values (Eq. (19)). 

By deriving the equations for the evolution of a Gaussian approximation to the full prob- 
ability distribution without linearizing the system, a more accurate approximation method 
was developed which can be used in a wider parameter regime than the linear approximation 
with good results. In the cases where the system's intrinsic noise is very low or the network 
dynamics are approximately linear, both methods work equally well. For moderate noise 
and weak nonlinearity, our method outperforms the linear approximation. 

This algorithm proved to be much faster than a stochastic simulation due to a number 
of factors. To begin with, numerical algorithms for the solution of SDEs cannot guarantee 
convergence as fast as standard ODE routines. As an example, the commonly used stochas- 
tic Euler scheme has an order of convergence of merely 0.5 whereas the Adams-Moulton 
method we use below has an order of convergence of 4. Algorithms with even faster con- 
vergence orders are available and simple to implement. On the other hand, more advanced 
algorithms for SDEs yield marginally better convergence at the cost of high complexity in 
terms of implementation [20]. The upshot is that much smaller time steps must be used for 
a stochastic simulation, thereby increasing run time. 

A second important point is the convergence of the stochastic method itself. To sam- 
ple from the probability distribution of interest with good accuracy, one must obtain sev- 
eral thousands of paths from the aforementioned SDE simulation with variance scaling as 
1 / V^Vruns- Importance sampling might mitigate this problem somehow, but the number of 
paths required is still several orders of magnitude high. On the other hand, the currently 
proposed method requires only one evaluation. 

A possible complication with the currently proposed method might be the calculation 
of higher order derivatives. However, current methods of algorithmic differentiation [25] 
allow a user to calculate derivative tensors of any order even when the derivatives are not 
analytically accessible. The calculation of these tensors is comparable in complexity to the 
calculation of the original function, being slower by only a fixed constant depending on the 
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order of the derivative. 

In the case of strong nonlinearities, stochastic simulations are still the preferred method. 
Clearly, multimodal or highly asymmetric target distributions will be poorly approximated 
by a multivariate Gausian. To remedy this, a possibility would be the use of a Gaussian 
mixture to represent the different modes of the probability distribution. However, this 
approach brings with it a number of problems as the entropic divergence cannot be computed 
analytically in this case. Additionally the number of parameters to be determined would 
scale at least as L (N '+ JV( ^ +1 ) + 1) ) where L is the number of Gaussian mixture components. 
This becomes unfeasible for a larger number of mixture components. 

In practice, this procedure may be sufficient to derive an efficient inference scheme for 
stochastic nonlinear networks, an issue left for the future. 
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